1 


AgRISTARS 


E81-10186 

SR-Ll-04031 //A(?a^ 

JSC-16853 

JAN 2 1 1981 




Supporting Research 


A Joint Program for 
Agriculture and 
Resources Inventory 
Surveys Through 
Aerospace 
Remote Sensing 

January 1981 



MAXIMUM LIKELIHOOD CLUSTERING WITH DEPENDENT 
FEATURE TREES 


C. B. Chittineni 



(E81-10186) MXIMUM LIKELIHOOD CLOSTEBIMG 
WITH DEPEtJDENT FEATURE TREES (Lockheed 
Engiiaeerang and Management} 54 p 
HC A04/flF A01 CSCL 12A 

V G3/Ji3 


N81-29502 


Onclas 

00186 


Engineering and Management Services Company, 
1830 NASA Road 1, Houston, Texas 77058 


Inc . 



Lyndon B. Johnson Space Center 

Houston, Texas 77058 



1 


4 


7 


9 


Report No 


2 


Government Accession No 


3 


SR-Ll-04031, JSC-16853 

Title and Subtitle S 


Maximum Likelihood Clustering With Dependent Feature Trees 


Author(s) I 8 


C. B. Chittineni 

Lockheed Engineering and Management Services Company, Inc. 

Rerforming Organization Name and Address 


Reapient's Catalog No 


Report Date 

January 1981 

Performing Organization Code 


Performing Organization Report No 

LEMSCO- 15683 

Work Unit No 


Lockheed Engineering and Management Services Company, Inc. 
1830 NASA Road 1 
Houston, Texas 77058 


12 Sponsoring AgerKV Name and Address 

National Aeronautics and Space Administration 
Lyndon B. Johnson Space Center 

Houston, Texas 77058 (Technical Monitor Dr. R. Hevdorn) 

15 Supplementary Notes 


11 Contract or Grant No 

NAS 9-15800 

13 Type of Report and Period Covered 

Technical Report 

14 Sponsoring Agency Code 


16 Abstract 

In this report, maximum likelihood clustering for the decomposition of mixture density 
of the data into its normal component densities is considered. The densities are 
approximated with first-order dependent feature trees using criteria of mutual informa- 
tion and distance measures. Expressions are presented for the criteria when the densities 
are Gaussian. By defining different types of nodes in a general dependent feature tree, 
maximum likelihood equations are developed for the estimation of parameters using fixed- 
point iterations. The field structure of the data is also taken into account in develop- 
ing maximum likelihood equations. Furthermore, experimental results from the processing 
of remotely sensed multi spectral scanner imagery data are presented. 


17 


Key Words (Suggested by Authar(s)) 


18 


Distribution Statement 


Clustering, distance measures, dependent 
feature trees, fields, fixed-point iteration 
schemes, link, maximum likelihood equations, 
mutual information, parameter estimation, 
types of nodes 


19 Sacurity Qassif (of this report) 

* 1 
20 Security Oassif (of this page) 

21 No of Pages 

22 Price' 

Unclassified 

Unclassified 

54 



*For tale by the National Technical Information Service, Springfield, Virginia 22161 


NASA ^ JSC 





















SR-Ll-04031 

JSC-16853 


MAXIMUM LIKELIHOOD CLUSTERING WITH 
DEPENDENT FEATURE TREES 


Job Order 73-306 


This report describes Classification activities of the 
Supporting Research project of the AgRISTARS program. 


PREPARED BY 
C. B. Chittineni 


APPROVED BY 



T. C. Minter, Supervisor 
Techniques Development Section 



E. Wainwright, ManaFger 
Deve/Vopment and Evaluation Department 


LOCKHEED ENGINEERING AND MANAGEMENT SERVICES COMPANY, INC. 
Under Contract NAS 9-15800 
For 

Earth Resources Research Division 

Space and Life Sciences Directorate 

NATIONAL AERONAUTICS AND SPACE ADMINISTRATION 
LYNDON B. JOHNSON SPACE CENTER 
HOUSTON, TEXAS 

January 1981 


LEMSCO-15683 



PREFACE 


The techniques which are the subject of this report were developed to support 
the Agriculture and Resources Inventory Surveys Through Aerospace Remote 
Sensing program. Under Contract NAS 9-15800, Dr. C. B. Chittineni, a principal 
scientist for Lockheed Engineering and Management Services Company, Inc., 
performed this research for the Earth Resources Research Division, Space and 
Life Sciences Directorate, National Aeronautics and Space Administration, at 
the Lyndon B. Johnson Space Center. 


i-&CEOiiiiS PAGE Bm« «OT 



V 



CONTENTS 


Section ,, Paqe 

1. INTRODUCTION 1-1 

2. GENERAL MAXIMUM LIKELIHOOD EQUATIONS 2-1 

3. APPROXIMATING PROBABILITY DENSITY FUNCTIONS 

WITH DEPENDENT FEATURE TREES 3-1 

3.1 CONSTRUCTION OF OPTIMAL DEPENDENT FEATURE TREES 3-1 

3.1.1 A CRITERION BASED ON INFORMATION MEASURE 3-2 

3.1.2 A CRITERION BASED ON PROBABILISTIC 

DISTANCE MEASURES 3-3 

3.2 EXPRESSIONS FOR THE CRITERIA WHEN THE 

DISTRIBUTIONS OF THE FEATURES ARETWSSIAN 3-4 

3.2.1 AN EXPRESSION FOR THE MUTUAL INFORMATION 

BETWEEN FEATURES AND Xj 3-5 

3.2.2 AN EXPRESSION FOR FEATURES x^ 

AND Xj ARE NORMALLY DISTRIBUTED 3-6 

4. A GENERAL DEPENDENT FEATURE TREE 4-1 

4.1 DIFFERENT TYPES OF NODES 4-1 

4.2 AN EXPRESSION FOR THE COVARIANCE BETWEEN THE FEATURES 

CONNECTED BY A PATH IN A DEPENDENT FEATURE TREE 4-3 

5. MAXIMUM LIKELIHOOD EQUATIONS FOR THE 

PARAMETERS OF THE DENSITY FUNCTIONS 5-1 

5.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE A PRIORI 

PROBABILITIES, MEANS, AND VARIANCES 5-3 

5.1.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE 

PARAMETERS OF TYPE I NODES 5-3 

5.1.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE 

PARAMETERS OF TYPE II NODES 5-5 

5.1.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE 

PARAMETERS OF TYPE III NODES 5-6 

PFiECEDiNG PAGE BLANK NOT FILMED 


VI 1 



Section 


Page 


5.1.4 MAXIMUM LIKELIHOOD EQUATIONS FOR THE 


PARAMETERS OF TYPE IVA NODES 5-7 

5.1.5 MAXIMUM LIKELIHOOD EQUATIONS FOR THE 

PARAMETERS OF TYPE IVB NODES 5-7 

5.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE 

COVARIANCES BETWEEN FEATURES 5-8 


5.2.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE 

COVARIANCE BETWEEN TYPE I AND TYPE II NODES 5-8 

5.2.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE 

BETWEEN TYPE IVA AND TYPE II OR TYPE III NODES 5-9 

5.2.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE 
BETWEEN TYPES OF NODES OTHER THAN THOSE CONSIDERED 

IN SECTIONS 5.2.1 AND 5.2.2 5-10 

6. EXPERIMENTAL RESULTS 6-1 

7. CONCLUDING REMARKS 7-1 

8. REFERENCES 8-1 


Appendix 

A. DERIVATIONS OF MAXIMUM LIKELIHOOD EQUATIONS 

FOR THE PARAMETERS OF A TYPE III FEATURE A-1 

B. MAXIMUM LIKELIHOOD EQUATIONS WITH FIELD STRUCTURE B-1 

C. DEPENDENT FEATURE TREES WITH THE NODES 

REPRESENTING FEATURE SUBSETS C-1 



FIGURES 


Figure Page 

3- 1 An example of a dependent feature tree 3-1 

4- 1 A general dependent feature tree 4-1 

4-2 Illustration of a type IVa node 4-2 

4-3 Illustration of a type IVb node 4-2 

4-4 An example dependent feature tree 4-3 

4-5 A path between features and Xp+g in a dependent 

feature tree 4-4 

4- 6 A path between features xj and Xp in a dependent 

feature tree 4-5 

5- 1 Reduction in the number of parameters with 

the dimensional ity 5-2 

5-2 A link in a dependent feature tree with a type I node 5-4 

5-3 A typical type II node in a general 

dependent feature tree ; 5-5 

5-4 A typical type III node in a general 

dependent feature tree 5-6 

5-5 A typical type IVa node in a general 

dependent feature tree 5-7 

5-6 A typical type IVb node in a general 

dependent feature tree 5-8 

5-7 A typical link connecting type I and type II nodes 5-9 

5- 8 A typical link connecting type IVa and type II 

or- type III nodes in a general dependent feature tree 5-9 

6- 1 Optimal dependent feature tree of segment 1648 6-1 

6-2 Optimal dependent feature tree of segment 1739 6-3 

6-3 Arbitrary dependent feature tree used in the experiment 6-4 

A-1 Illustration of a typical type III node in a general 

dependent feature tree A-1 


ix 



1. INTRODUCTION 


Recently, considerable interest has been shown in developing techniques for the 
classification of imagery data (such as remotely sensed multi spectral scanner 
data acquired by the Landsat series of satellites) for inventorying natural 
resources, monitoring crop conditions, and detecting changes in natural and 
manmade objects. Nonsupervi sed classification or clustering techniques have 
been found to be effective in the analysis of remotely sensed data (ref. 1). 

The approach of clustering for imagery data classification, in general, 
involves two steps: (1) partitioning the image into its inherent modes or into 

Its homogeneous parts and (2) labeling the clusters using information from a 
given set of labeled patterns. 

In practical applications of pattern recognition such as remote sensing, it is 
difficult to obtain labels for the patterns. In remote sensing imagery, an 
analyst-interpreter provides the labels for the picture elements (pixels) by 
examining imagery films and using other information (e.g., crop growth stage 
models and historic information). Remote sensing imagery usually has a field 
structure, and it is recognized that fields are easier to label than are 
pixels. The development of algorithms for locating fields has attracted the 
attention of several researchers in the recent literature (refs. 2-5). 

Considerable interest has been shown in applying maximum likelihood equations 
for the decomposition of the mixture density of the imagery data into its 
normal component densities (refs. 5-9). Recently, methods have been developed 
(refs. 10, 11) for probabilistically labeling the modes of the data using 
information from a given set of labeled patterns and, also, from a given set of 
labeled fields. 

In decomposing the mixture density of the data into its normal component densi- 
ties, the parameters of the component densities and the a priori probabilities 
of the modes are iteratively computed using maximum likelihood equations coupled 
with a split and merge sequence. The updating of the parameters is usually 
stopped after a few iterations because of the large amount of computation. 
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Also, in practical problems (remote sensing imagery data of several acquisi- 
tions), a large number of parameters will be estimated. For a fixed sample 
size, the accuracy of estimation usually decreases (ref. 12) as the number of 
parameters to be estimated increases. To overcome the computational require- 
ments and the large number of parameters to be estimated with the usual maximum 
likelihood clustering technique, maximum likelihood equations are obtained in 
this report by approximating the cluster conditional densities with first-order 
tree dependence (refs. 13, 14) among the features. The field structure of the 
data is also taken into account. Either the average mutual information between 
the features (ref. 13) or the probabilistic distance measures (ref. 15) can be 
used to construct optimal dependent feature trees for a given data type. 

This paper is organized as follows. General maximum likelihood equations are 
presented in section 2. Section 3 concerns the problem of approximating proba- 
bility density functions with dependent feature trees using the criteria of 
information measure and probabilistic distance measure. Expressions are 
derived for the criteria when the distributions of the features are Gaussian. 

In section 4, a general dependent feature tree and its various types of nodes 
are described, and expressions for the covariance between the features not 
connected by a single link are derived. Maximum likelihood equations for the 
parameters of the density functions when approximated by dependent feature 
trees are developed in section 5. Experimental results from the processing of 
remotely sensed multi spectral scanner imagery data are presented in section 6. 
Section 7 contains the concluding remarks. Detailed derivations of maximum 
likelihood equations are given in appendix A. In appendix B, the field 
structure of the data is taken into account in developing maximum likelihood 
equations. An expression is derived in appendix C for the mutual information 
between the feature subsets when they are represented by the nodes in a 
dependent feature tree. Also, expressions are derived for the covariance 
between the feature subsets when they are connected by a path in a dependent 
feature tree. 
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2. GENERAL MAXIMUM LIKELIHOOD EQUATIONS 


General maximum likelihood equations are presented in this section for the 
decomposition of the mixture density of the data into its component densities. 
It is assumed that a set -X' = {Xj^ of N unlabeled patterns, each of 
dimension n, is given. These patterns are assumed to be drawn independently 
from the mixture density 


m 

P(X|6) = X; p{X,u> .0 lP(ui ) (2-1) 

j=l J J J 

where 0 is a fixed but unknown parameter vector, 0j is a parameter vector for 
the cluster, and m is the number of modes or clusters in the data. Let 
P(u)j) and p(X|ojj) be the a priori probabilities of the modes and mode condi- 
tional densities, respectively. The likelihood of the observed pattern vectors 
is, by definition, the joint density 

N 

P(*|e) = n P(XJ0) (2-2) 

k=l 


Since the logarithm is a monotonic function of its argument, taking the 
gradient of the logarithm of equation (2-2) with respect to 0^ results in 




m 


(2-3) 


where 


^ £ log[p(X. |9] 


k=l 


(2-4) 


and Vg £ IS the gradient of i with respect to 0^. From the Bayes rule, the 

”i 

a posteriori probability can be written as 


, P(X,^lo). ,e.)P(oj.) 

p(xje) 


(2-5) 
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If the elements of 6^- and 0j are assumed to be functionally independent, using 
equation (2-5) in equation (2-3) yields 




N 

= E 

k=l 


P(w^ |X|^,e)Vg log[p(X|^|o). , 0 ^.)P(w )]j 


( 2 - 6 ) 


The following likelihood equation for the a priori probabilities can easily be 
obtained from equation (2-6) by introducing Lagrangian multipliers to take into 
account the probability constraints on P(w-j). 

P(^-) E P(<^i 1X^.0) (2-7) 
1 


Since 0j is a parameter vector of the density of the i^*^ cluster, 
equation (2-6) can be written as 



i 



P(t^^ jlo9[p(X|^ |o)^- .0i)]j 


( 2 - 8 ) 


From equation (2-8), general maximum likelihood equations for the parameters of 
the cluster conditional densities can be obtained. 
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3. APPROXIMATING PROBABILITY DENSITY FUNCTIONS WITH DEPENDENT FEATURE TREES 


If the probability density function of the i^*^ class is approximated by a 
first-order dependent feature tree, it can be written as 


Pt(X) = fl 
^ 1=1 


Pi 



0 < j(£) < £ 


(3-1) 


where x is the m ^ ^ feature of pattern vector X; (m, , •••, m ) is an unknown 

in 

permutation of integers 1 , 2 , •••, n; and p(x^Ixq), by definition, is equal to 
p(xj). Each variable in the above expansion may be conditioned upon, at most, 
one of the other variables. Figure 3-1 shows an example of a dependent feature 
tree. 


^1 



Figure 3-1.- An example of a dependent feature tree. 

The component of the density in the product approximation that is represented 
by a single link, such as the one connecting features xg and X 3 in figure 3-1, 

IS p(x 3 |x 5 ). The density that is approximated by the dependence tree of 
figure 3-1 can be written as 

P(X) = P(xj)p(x 2 |x^)p(x 3 |x 2 )p(x 4 |x 2 )p(x 5 |xj)p(xglx 5 )p(x 7 |x 5 )p(xg|x 5 ) (3-2) 

3.1 CONSTRUCTION OF OPTIMAL DEPENDENT FEATURE TREES 

This section concerns the problem of constructing dependent feature trees. The 
dependent feature tree, the density of which best approximates the true density, 
IS proposed to be constructed using either the criterion of information preser- 
vation (ref. 13) or the criterion of class separability (ref. 15). An algorithm 
developed by Kruskal (ref. 16) provides an efficient computational procedure for 
constructing optimal dependent feature trees using the expressions developed in 
the following. 
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3.1.1 A CRITERION BASED ON INFORMATION MEASURE 

Let P-|^(X) be the approximate density of the i^*’ class with the product 
approximation. That is. 


Consider the following measure of closeness between the true and approximate 
densities (ref. 13). That is, 

■p,(X) 1 

where C is the number of classes. From equation (3-4), it is seen that 
I(P>Pt) = 0 whenever Pj(X) is equal to P-jt(X) for all X and that I(p,p^) > 0 if 
P-,(X) IS different from P-,|;(X) for some X. To find the product approximation 
for the densities or the dependent feature tree that minimizes I(p,p^), 
consider 


I(P.P^) 


|pK)| 


PT(X)log 


where 


and 


I(p.Pt) = - £ P(«i) J Pi (X)log[p^^(X)]dX 

C f 

+ £ P(Wt)J Pi(X)log[p. (X)]dX 
C 

n C /• 

K = - I(x ) + £3 P(“i) I P,-(X)log[p (X)]dX 
t=l ^ 1=1 W 1 


(3-5) 


(3-6) 
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The quantity is the mutual information between features Xj^ and 
Xj(j^) of class 1 . From equation (3-5), Kruskal's algorithm (ref. 16) can be 
efficiently used to construct optimal dependent feature trees. 


3.1.2 A CRITERION BASED ON PROBABILISTIC DISTANCE MEASURES 


A probability density function, like any other function, can be approximated by 
a number of different procedures. In the sense of preserving the separability 
between the classes, it is proposed that a criterion based on probabilistic 
distance measures such as divergence be used to construct dependent feature 
trees. The measure of closeness between the approximate and true densities is 
defined as 


'12 




p,(X)log 


■petOO' 


Pu^ 


dX 


(3-7) 


From equation (3-7), it is seen that J ^2 is large whenever the ratio of Pi^(X) 
to P2t(X) ''S large in the region of class 1 and the ratio of P2t(^) 1^° Pit^^) 
is large in the region of class 2. By using the product approximation of equa 
tion (3-3) for the densities p^^(X), equation (3-7) can be written as follows. 


'12 


n f 

= E Jpi 

1=1 j ^ 


(X)log 


Pi 


"’j(i)J 

P2 

V 


1 1 

r— 

'~ 7 > 

£ 

X 


dX 


(P2 


S(i)] 

(Pi 


X 1 


™JO)J 


>dX 




[Pi 


mi 


m. 




j( 


■"i '"j(i) 



(3-8) 
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where 



(3-9) 


(3-10) 


If more than two classes exist, the expected value of the measure of closeness 
defined over pairs of classes can be used to obtain optimal approximations for 
the densities (ref. 17). From equation (3-8), Kruskal's algorithm (ref. 16) 
can be efficiently used to construct optimal dependent feature trees. 


3.2 EXPRESSIONS FOR THE CRITERIA WHEN THE DISTRIBUTIONS OF THE FEATURES ARE 
GAUSSIAN 


Expressions are derived in this section for the mutual information and for 
between the features, assuming that the distributions of the features are 
Gaussian. If Pj^(Xj)» the density of feature x^ of the class, is Gaussian, 
it can be written as 


Pj(x,) 








(3-11) 


or it IS denoted as Pji(x-j) ~ N[u^. (i) ,a^. (£)]. The joint and conditional 


densities of features x^- and xj of the class can be written as follows. 
P^(x^.,x.) = 


2ti ja^. (£)Oj(£) j^l - pjj(t)j 


^ exp - ^ q^(Xi ,Xj.) (3-12) 


where 


q^(x,-,x,) = 


1 


n2 


r- o -i\ ______ — ^P-:,‘('t) 

2[l - Pij('0] )[ J 


x^. - u^.(£)‘ 










h2 


(3-13) 


3-4 



(3-14) 


2 


,th 


and o -{z) is the covariance between features x,- and x, of the £ class. From 

I J * J 

equations (3-11) and (3-12), the conditional density can be written as 


Pj(x,lXj) 


where 

qj(x,lx ) = 




Y72 exp[-qj,(xJXj)] (3-15) 


W 


2a 


,(*)[! 




(3-16) 


3.2.1 AN EXPRESSION FOR THE MUTUAL INFORMATION BETWEEN FEATURES x^ AND Xj 

In this section, an expression is derived for the mutual information between 
Gaussian-distributed features x^ and xj of class £. From equations (3-11) and 
(3-15), the following can easily be obtained. Consider 


= 1 - P-,(0 expj-p-I3 ^ - 

L ^ J Ui - P,j(iO] [ 




p?/.r ( 

X. - u.(£) 

2 

'xj - Uj(»)' 

1 ) 







(3-17) 


From equation (3-17), the mutual information between features x.,- and Xj of 
class £ can be obtained as follows. 


r 

■ Pj^(x^ ,Xj) 

1 P)i(^i»^j)^°9 

[pj(x,)p,(Xj)J 


dx^ dXj 


I logj^l - P?j(t)j > 0 


(3-18) 
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3.2.2 AN EXPRESSION FOR AJ,,(x.,x.) WHEN FEATURES x- AND x. ARE NORMALLY 
DISTRIBUTED i J ' J 

From equations (3-7) and (3-9), when features x^- and Xj are normally 
distributed, an expression for AJj 2 (x^,Xj) can be easily obtained as follows. 



(3-19) 

where A.^.(l) = o.(l)aj(l) - o^(l) (3-20) 
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4. A GENERAL DEPENDENT FEATURE TREE 


A general dependent feature tree is shown in figure 4-1. 



Figure 4-1.- A general dependent feature tree. 


Each node of the tree represents a feature, and the feature numbers are given 
in figure 4-1. In approximating the probability density functions with depend- 
ent feature trees, each feature may be conditioned upon, at most, one of the 
other features. Node is the root node of the tree. Nodes X2, X4, X5, Xy, 
etc., are nodes on the periphery of the tree. 


4.1 DIFFERENT TYPES OF NODES 

For convenience in the following analysis, the nodes of the dependent feature 
tree are divided into the following types. 

1. Type I nodes: Except for the root node, nodes on the periphery of the tree 
are defined as type I nodes. For example, in figure 4-1, nodes X 2 , x/^, X 5 , 
xj, etc., are type I nodes. 
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2. Type II nodes: These are nodes which are one node deep from the periphery. 

For example, in figure 4-1, nodes Xg, x^q, x^g, x^y, x^g, etc., are type II 
nodes. 

3. Type III nodes: These are nodes which are at least two nodes deep from the 

periphery. For example, in figure 4-1, nodes X 3 , xg, xg, Xjg, etc., are 
type III nodes. 

4. Type IV nodes: The root node of the tree is defined as a type IV node. 

The types of root nodes are divided into type IVa and type IVb. Examples 
of the types of root nodes are described in the following. 

a. Type IVa node: The type IVa node is the root node of a tree with a 
single link. As an example, node x^ of figure 4-2 is a type IVa node. 



b. Type IVb node: The type IVb node is the root node of a tree with two 

or more links. As an example, node xj of figure 4-3 is a type IVb 
node. 



It IS noted that the type IVb node is different from the type IVa node in that 
more than one node links directly with the root node of the tree. 
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4.2 AN EXPRESSION FOR THE COVARIANCE BETWEEN THE FEATURES CONNECTED BY A PATH 
IN A DEPENDENT FEATURE TREE 

An expression for the covariance between the features when a path connects 
their representative nodes in a dependent feature tree is developed in this 
section. For example, features Xj^j^ and Xj^g are connected through features x^g, 
Xg, xg, Xj 3 , and x^^g in the dependent feature tree of figure 4-1. For the 
following analysis, consider the dependent feature tree shown in figure 4-4. 



Figure 4-4.- An example dependent feature tree. 


The probability density represented by the dependent feature tree of figure 4-4 
can be written as follows. 

P(X) = P(x^)p(x 2 |x^)p(x 3 |x 2 )p(x^|x 2 )p(x 5 |x 2 )p(xglx 5 )p(x^|x^) (4-1) 

In the following, an expression for the covariance between features xg and xy 
of figure 4-4 is derived. 

E[(Xg - Ug)(x7 - U7)] (Xg - Ug)(x7 - U7)p(X)dX 

•f P(x 4 |x 2 )dX 4 J" (X 7 - Uy)p(Xyjx^)dx7 (4-2) 

From equation (3-15), the following equations are obtained. 


/ 



Uy)p(X7lx^)dX7 



(4-3) 



- U4)p(x^|x2)dx^ 



(4-4) 


4-3 


Using equations (4-3) and (4-4) in equation (4-2) yields 


E[(Xg - Ug)(x^ - Uy)] P74'^04 ^42^ a^J ^^2 ~ ‘^2^P^^2^^^2 


J P(x 5 |x 2 )dXg J (Xg - Ug)p(xg|xg)dxg (4-5) 

Similar to equations (4-3) and (4-4), which were developed from equation (3-15), 
the following are obtained. 

/ I 5 / ( 4 “ 6 ) 

(Xg - Ug)p(Xg|x2)dXg = P52^~a^ ^^2 " “2^ f 

J* (X2 - U2)(X2 - U2)p(x2)dX2 = j 

From equations (4-5) and (4-6), the covariance between features X 5 and kj can 
be obtained as follows. 


6 6'' 7 ©2 Og 52 


(4-7) 


For a general case, the following theorems can easily be established. 


Theorem 1 : Suppose the features and x^^.^ in a dependent feature tree are 

connected by a path as shown in figure 4-5. Then, the covariance between 
features x^ and Xp^.^ is given by equation (4-8). 



XI 


Figure 4-5.- A path between features x^ and Xp +5 
in a dependent feature tree. 
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(4-8) 


E[(Xi - Uj)(x 


r+s 




'12 


23 


r-l,r 


r+s, r+s- 1 
o 


r+s-1 


r+s-l,r+s-2 r+2,r+l 

* • • • ^ 


r+s-2 


r+1 


r+l,r 


Theorem 2 : Suppose the features and x^ in a dependent feature tree are 
connected by a path as shown in figure 4-6. Then, the covariance between 
features x;j^ and x^ is given by equation (4-9). 




^2 




X4 


— 

Xr-i 


Figure 4-6.- A path between features Xj^ and x^ 
in a dependent feature tree. 


°?3 

E[(Xi - Uj)(x^- u^)] 


'34 


V-2,r-l 

^r-1 


V-l,r 


(4-9) 
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5. MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF THE DENSITY FUNCTIONS 


In this section, maximum likelihood equations are developed for estimating the 
parameters of the cluster conditional densities when approximated by the first- 
order dependent feature trees. In practice, such as in the classification of 
remotely sensed multi spectral scanner imagery data, considerable interest has 
been shown in applying maximum likelihood clustering for the decomposition of 
the mixture density of the data into its normal component densities. The 
mixture density p(X) can be written as 

m 

p(X) = D P{o).)p(Xla) ) (5-1) 

i=l ^ ^ 

where m is the number of clusters, and P(w^-) and p(X|o)^-) are the a prion proba- 
bilities of the modes and mode conditional densities, respectively. If the 
cluster conditional densities are Gaussian [i.e., p(X|ci)^.) ~N(U^. ,E^)], by using 
a given set of N independent observations from the mixture density, from equa- 
tion (2-6), the maximum likelihood equations for the estimates of the parameters 
of the mixture density can easily be shown to be the following (ref. 6). 

P(«,-) = P(»,|X,) 

i; x,p(»,ix,) 

>^1 

g (Xk-U,)(X, -u,)V»,|x^) 

Z p(<^iixj 

k=l ^ 

In maximum likelihood clustering, equation (5-2) is used for updating the 
parameters of the densities, and this computation is coupled with a split and 
merge sequence. The updating is usually stopped after a few iterations because 
of the large amount of computation in clustering data such as imagery data. 
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For practical problems, the number of parameters to be estimated is large. 

Using equation (5-2), the number of parameters to be estimated for each mode is 
3) ^ where n is the dimensionality of the patterns. It is known that, 
with a fixed sample size, the accuracy of estimation usually decreases as the 
number of parameters to be estimated increases (ref. 12). 

In this paper, the cluster conditional densities are approximated with first- 
order dependent feature trees to reduce the number of parameters to be 
estimated. In the product approximation for the densities discussed in the 
previous sections, it is noted that each feature is conditioned upon, at most, 
one of the other features. The number of parameters to be estimated for each 
mode is obtained as follows: the means n, the variances n, and the covariances 

(n - 1), or a total of (3n - 1), where n is the dimensionality of the patterns. 
When the product approximation is used for the probability densities, with an 
increase in the dimensionality of the patterns, the reduction in the number of 
parameters for each mode is as shown in figure 5-1. 



Figure 5-1.- Reduction in the number of parameters with the dimensionality. 

In the following, maximum likelihood equations are developed for estimating the 
parameters of the cluster conditional densities when approximated with first- 
order dependent feature trees. It is assumed that the structure of the depend- 
ent feature tree is determined using the techniques discussed in section 3. 

The different types of nodes described in section 4 are considered separately. 
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5.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE A PRIORI PROBABILITIES, MEANS, AND 
VARIANCES 


In this section, maximum likelihood equations similar to equation (5-2) are 
obtained for the a priori probabilities of the clusters and for the means and 
variances of features in each cluster when the cluster conditional densities 
are approximated with dependent feature trees. The different types of nodes 
discussed in section 4 are treated separately. It is assumed that a set 
^ of N unlabeled patterns, each of dimension n, drawn independ- 

ently from the mixture density p(X) is given. When the cluster conditional 
densities are approximated by first-order dependent feature trees, the density 
of the i^*^ cluster can be written as 

P(X|«,) = (5-3) 

The maximum likelihood equations for the a priori probabilities of the clusters 
can easily be shown to be the following. 

P(»,) P(»llX|() (5-4) 


If 0^ IS a parameter of the i^*^ cluster, using equation (5-3) in equation (2-8) 
results in 


IL 

90 . 




(5-5) 


In the following, it is assumed that the distributions of pattern features in 
each cluster are Gaussian. That is, 

P^(Xj^) ~ N[Uj^(i),Oj^(i)] (5-6) 


5.1.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE I NODES 

Consider a link in a dependent feature tree containing a type I node, as shown 
in figure 5-2. 
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Figure 5-2.- A link in a dependent feature tree 
with a type I node. 


Since each feature is conditioned upon, at most, one of the other features, 
equation (5-5) becomes 




N 


3^-' E P(“,lXk,6) |log[p,(X2.X3)]| 

for (j)^ = 03 ( 1 ) 
and (|>^ = 03 ( 1 ) 


(5-7) 


When <|i^ = 03 ( 1 ), from equation (5-7), the following is obtained. 

gp(»,|x,.e)jx^-^[x^-.3(1,]| 


Us(i) = 


(5-8) 


£ p(w ix. , 0 ) 

k=l ^ 


In equation (5-7), letting <{)^ = 03 ( 1 ) and = 0 ^ 3 ( 1 ) and eliminating 

[^3 ■ resulting equations yields, after simplification, an 

expression for the covariance between type I and type II nodes. That is. 


®23 ^ ^ ^ ~ ^2 ^ ^ ^ 


E P(“,|X|^, 6 )[x^ - U2(i)][x^ - U3(i)l 

5 fi; 75 

£ p(w^ |X|^,0) [x^ - U2(i)J 

k ~ 1 


(5-9) 


Letting <|)^ = cj 3 (i) in equation (5-7) and using equation (5-9) yields the 
following. 


a^(i) = 


p(*^, I\.9 )|[x3 - »3(t)] ^ (fg'CD" ~ [4 ' ~ 


^ p(u, 1 X,^, 8 ) 
k=l ’ *■ 


(5-10) 
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5.1.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE II NODES 

A typical type II node, as defined in section 4, in a general dependent feature 
tree is shown in figure 5-3. 



Figure 5-3.- A typical type II node in a 
general dependent feature tree. 


In figure 5-3, node is a type II node, and nodes x^, x^, ••*, x^ are type I 
nodes. The terms in the product approximation of the probability density 
function of cluster i containing feature x^ are 

p(X|o)^) = ••• p(x^lx^)p(xj|x^) ••• p(x^|Xg) ••• (5-11) 


If IS a parameter of the i^*^ cluster, using equation (5-11) in 
equation (2-8) yields 


ae . 

1 



P(w^ |X|^,0) 



r 

g Pi(x*|Xs) * P,(XslXt) 


= 0 


(5-12) 


From equation (5-12), letting 9^ = 0^(1) results in the following maximum 
likelihood equation for the mean of feature x^ of cluster i. 


E P(p>,|x. ,8)x; 
k=l ’ ® 




n 

^ 1 

L^Tm 

k 






c£ ( ^ ) r V 


'll 


E P(“, p..e) 

k=l ’ " 


where 




(5-13) 

(5-14) 
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In equation (5-12), letting 6^. = ***» yields 

the following after simplification. 


OjO) = 

(5-15) 

5.1.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE III NODES 

A typical type III node in a general dependent feature tree is shown in 
figure 5-4. 




Figure 5-4.- A typical type III node in a 
general dependent feature tree. 


In figure 5-4, is a type III node; and nodes Xj, X 2 » •••, x^ and x^ may be 
type III nodes or other types of nodes. Proceeding as in section 5.1.2, the 
maximum likelihood equations for the variance and mean of feature x^ of 
cluster i can be shown to be the following (see appendix A). 



(5-16) 
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and 


E P(«,lx. ,e)x^ + 
k=l ^ ^ ^ 


Uc(l) = 


n 

r 

V’l 1 


1=1 



s K ■ "<">] • 1.^ ^ R - “."'1|) 


E p(“, l>(|,.9) 

k=l ’ 


(5-17) 


5.1.4 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE IVA NODES 

A typical type IVa node in a general dependent feature tree is shown in 
figure 5-5. 


M 

xi X2 X3 

Figure 5-5.- A typical type IVa node in a 
general dependent feature tree. 


In figure 5-5, xj is a root node of type IVa, and node X 2 may be of type I, 
type II, or type III. The maximum likelihood equations for the variance and 
mean of feature Xj of cluster i are given in the following. 


Old) = 




^ p(u, |X|^,e) 


(5-18) 


and 


Uj(i) 


(i) 


gp(«,iv){x5-^ [4-2(1)] 

N 

E P(“i|X;.,0) 
k=l ^ 


(5-19) 


5.1.5 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE IVB NODES 

A typical type IVb node in a general dependent feature tree is shown in 
figure 5-6. 
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Figure 5-6.- A typical type IVb node in a 
general dependent feature tree. 


In figure 5-6, is a root node of type IVb, and nodes x^, x^, •**, x^ are of 
type I, type II, or type III. The maximum likelihood equations for the 
variance and mean of feature x^ of cluster i can be shown to be the following. 



In this section, maximum likelihood equations are developed for the covariances 
between the features when the probability density functions of the clusters are 
approximated by first-order dependent feature trees. 


5.2.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE BETWEEN TYPE I AND 
TYPE II NODES 

In this section, maximum likelihood equations for the covariance between type I 
and type II features are derived. A typical link connecting type I and type II 
nodes is shown in figure 5-7. 
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Figure 5-7.- A typical link connecting type I 
and type II nodes. 


In figure 5-7, node is of type I, and node Xp is of type II. The maximum 
likelihood equation for the covariance between features Xp and Xg of cluster i 
IS given in the following. 


o ( 1 ) = a ( i ) 
rs' ' r' ^ 



(5-22) 


5.2.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE BETWEEN TYPE IVA AND 
TYPE II OR TYPE III NODES 

A typical link connecting type IVa and type II or type III nodes in a general 
dependent feature tree is shown in figure 5-8. 



Figure 5-8.- A typical link connecting 
type IVa and type II or type III nodes 
in a general dependence tree. 


In figure 5-8, node x^ is of type IVa, and node X 2 may be of type II or 
type III. The maximum likelihood equation for the covariance between 
features x^ and X 2 of cluster i is as follows. 




CD 

X 

3 

CL 


ui(i)] 

[ 4 - 

• Li2(i)] 

N 

D p(w-ix.,0) 
k=l ^ 

x^ - u 

2(i)' 

CM 


(5-23) 
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5.2.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE BETWEEN TYPES OF NODES 
OTHER THAN THOSE CONSIDERED IN SECTIONS 5.2.1 AND 5.2.2 

Let there be a link between nodes X 2 and X 3 in a general dependent feature tree 
whose types are other than those considered in sections 5.2.1 and 5.2.2. The 
maximum likelihood equation for the covariance between features X 2 and X 3 of 
cluster i is given in the following. 


•0 

X 

a> 

|o2(i)o3(i 

1) + 023(i)A23(i) 


*2 - “2^’)] 

[*3 ■ 

E p( 

k=l 


I03O) 

^2 * ^2^^ ^ 

,Z 

1 + 02(1) 

[*3 ■ 

n 


(5-24) 
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6. EXPERIMENTAL RESULTS 


In this section, some results from processing remotely sensed Landsat 
mul ti spectral scanner imagery data are presented. The images are of a 5- by 
6-nautical-mile area called a segment. The image is divided into a rectangular 
array of pixels, 117 rows by 196 columns. The image is overlaid with a rectan- 
gular grid of 209 grid intersections. Two classes are considered; class 1 is 
wheat, and class 2 is "other." The true (ground truth) labels for the pixels 
at the grid intersections are acquired. The locations of the segments and the 
individual acquisitions used for each of the segments are listed in table 6-1. 
The a prion probabilities of the classes are estimated as sample estimates. 
Equations (3-6) and (3-18) are used to compute the weighted mutual information 
between the features, assuming in each class that the features are Gaussian 
distributed. Kruskal's algorithm (ref. 16) is used to construct optimal 
dependent feature trees by minimizing I(p,p|.) of equation (3-5). The optimal 
dependent feature trees of segments 1648 and 1739 are shown in figures 6-1 and 
6-2, respectively. 



Figure 6-1.- Optimal dependent feature tree 
of segment 1648. 

Generally, it is known that, for each acquisition, a strong dependency exists 
between channels 1 and 2 and between channels 3 and 4. From figures 6-1 and 
6-2, it is seen that these dependencies appear in the optimal dependent feature 
trees. 
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TABLE 6-1.- CONFUSION MATRICES AND CLASSIFICATION ACCURACIES OF BAYES CLASSIFIER 
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Confusion matrix. 

Probability of correct classification. 
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Figure 6-2.- Optimal dependent feature tree 
of segment 1739. 

To investigate the effectiveness of the optimal dependent feature trees in 
classification, an experiment is performed to compare the classification accu- 
racies of the Bayes classifier (1) when the densities are approximated with 
optimal dependent feature trees, (2) when no approximation is used for the 
densities (full covariance matrix), (3) assuming the features are independent, 
and (4) when the densities are approximated with arbitrary dependent feature 
trees. Spectral vectors of 104 labeled pixels are used as the training pattern 
set, and the spectral vectors of the remaining 105 labeled pixels are used as 
the test pattern set. The structure of the arbitrary dependent feature tree 
used in this experiment is shown in figure 6-3. 

The computed confusion matrices and the classification accuracies on the train- 
ing and test sets for each of the segments processed are listed in table 6-1. 
From table 6-1, it is seen that, in general, better classification accuracies 
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Figure 6-3.- Arbitrary dependent feature tree 
used in the experiment. 

are obtained on the training set when the full covariance matrix is used 
without approximating the densities. Improved classification accuracies are 
obtained on the test set when the densities are approximated with optimal 
dependent feature trees. This might be due to the fact that a large number of 
parameters are estimated when the full covariance matrices are used. 

One of the important objectives in the classification of remotely sensed 
agricultural imagery data is to estimate the proportion of the class of 
interest in the image. The ratio of the variance of the estimated proportion 
using machine classification to the variance of the estimated proportion using 
simple random sampling is called variance reduction factor R (ref. 1). The 
quantity R can be viewed as an indication of how much the machine classifica- 
tion improves the proportion estimation. The computed variance reduction 
factors for each of the segments processed are listed in table 6-2. From 
table 6-2, it is seen that the variance reduction factor consistently improves 
when the densities are approximated with dependent feature trees, compared to 
the other cases. 
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7. CONCLUDING REMARKS 


In the classification of imagery data, such as in the machine processing of 
remotely sensed multispectral scanner data, unsupervised classification 
techniques have been found to be effective. The application of clustering 
techniques for the analysis of imagery data essentially involves two steps: 

(1) clustering the data or partitioning the image into its inherent modes and 

(2) giving the probabilistic class labels to the resulting clusters. In 
practice, it is observed that fields are relatively easy to label when compared 
to pixels. 

Several researchers have investigated methods for locating fields in the 
imagery data. Recently, considerable interest has been shown in developing 
techniques for probabilistically labeling the clusters using information from a 
given set of labeled patterns and, also, from a given set of labeled fields. 

In decomposing the mixture density of the data into its normal component densi- 
ties, the parameters of the component densities and the a priori probabilities 
of the modes are iteratively computed using maximum likelihood equations coupled 
with a split and merge sequence. The updating of the parameters is usually 
stopped after a few iterations; and for practical data, a large number of param- 
eters must be estimated. For a fixed sample size, the accuracy of estimation 
usually decreases as the number of parameters to be estimated increases. 

To overcome the above shortcomings, it is proposed in this paper that the den- 
sities be approximated with first-order dependent feature trees. The dependent 
feature trees can be constructed using criteria based on information measure 
and, also, based on class separability measure. Expressions are derived for 
the criteria when the distributions of the features are Gaussian. Expressions 
also are derived for the covariances between features not connected by a single 
link in the dependent feature tree. 

Different types of nodes are defined in a general dependent feature tree. 

Maximum likelihood equations are derived for the parameters of the mixture 
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density of the data by approximating the cluster conditional densities with 
first-order dependent feature trees. The field structure of the data is also 
taken into account in the decomposition of the mixture density of the data into 
its normal component densities. Furthermore, experimental results from the 
processing of remotely sensed multi spectral scanner imagery data are presented. 
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DERIVATIONS OF MAXIMUM LIKELIHOOD EQUATIONS FOR THE 
PARAMETERS OF A TYPE III FEATURE 

In this appendix, maximum likelihood equations for the parameters of a typical 
type III feature are derived. A typical type III node in a general dependent 
feature tree is illustrated in figure A-1. 



Figure A-1.- Illustration of a typical type III node 
in a general dependent feature tree. 

In figure A-1, X 2 is a type III feature. The following is obtained from equa- 
tion (5-5) by keeping only the terms that involve feature %2 product 

approximation of the density of the i^*^ cluster. 

N 

|log[p. (x^|x 2 )] + logCp. (X 3 IX 2 )] 

+ log[p^(x2|xj)] j 
N 3 j 

= Z jlog[p. (x^.x^)] + log[p. (X 2 ,X 3 )] 

k"”l i 

+ 1og[p^ ( x^,X 2)3 - 2 logCp. (X 2 )] - logCp. (xj)]j (A-1) 
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It IS assumed that the features of the cluster are Gaussian distributed. 
That is, 

log[p^(x^,x^)] = -log(2Ti) - | log[A^^(i)] - 

- 2 o^^(i)[x^ - u^(i)][x^ - Ug(i)] + 

(A-2) 

where ^rs^^^ " o^{i)Os(i) - ®ps(i) (A-3) 

Using equation (A-2) in equation (A-1) yields 

N 3C. 

i-=gPKIX,.e)^ (A-4) 

where 

S ’ ^ A~rrr ■ “2^’)] - 2a24(0[xJ - “2(i)][x5 - u^fi)] + a2(i)[xj - u^Ii)] j 

+ 1og[&23(i)] + - 2»23(’)[>4 ’ “2(’)][’'3 ' ‘ “3^’)] j 

+ log[42i(i)] joiO)[^ - ^ 2 ^')] - 2<»2i(i)[x 2 - U2(i)][xj[ - Uj(i)] + a2(i)[x5 - Uj(i)]^j 

.2 1og[o20)].^[x^2-‘'2(^)] (A-5) 

Letting 0^. = U 2 (i). from equation (A-5), the following is obtained. 
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CD 


; - U 2 (i)] ■ 

1 [J 

1 - “4<'>, 
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9u2(i) ( 

4^?^ L: 


i 


i 

x^ - 0 ^( 1 )] 

^ 2023 ( 1 ) 

[43 - U3(i)]j 

+ 1 

! ^23^^^ 

^23 ( 1 ) 


[ 2 a^(i) 

x^ - u^(i) 
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1 4. 
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+ 1 
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Substituting equation (A-6) in equation (A-4) and equating the results to zero 
yields the following maximum likelihood equation for the mean of feature X 2 of 
cluster i. 


Ej P(-,|X^.e)4 * „^(i) a3(i) ^TiTj • B ■ “l”’] " ^StTT B - “3<’>] * sfjrrr B - 

\ L°2b) f 


E p(“, |X|,.0) 

k=l ’ “ 


(A-7) 


Letting 0^ = 02 ( 1 ) in equation (A-4) and equating the result to zero yields the 
following after simplification. 



Similarly, differentiating i with respect to J ~ 1>3,4 and equating 

the resulting expression to zero yields 



(A-9) 

Using equation (A-9) for j = 1,3,4 in equation (A-8) yields, after 
simplification, the following maximum likelihood equation for 02 (i). 



(A-10) 
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Proceeding, similar to equations (A-9) and (A-10), it can easily be shown that 
the maximum likelihood equations for the mean Ug(i) and variance Og(i) of 
feature of cluster i of figure 5-4 are of equations (5-16) and (5-15). 
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MAXIMUM LIKELIHOOD EQUATIONS WITH FIELD STRUCTURE 

In practical applications of pattern recognition, such as in the classification 
of remotely sensed agricultural imagery data, one of the difficult problems is 
to obtain labels for the training patterns. The labels for the training pat- 
terns are usually provided by an analyst-interpreter by examining imagery films 
and using some other information such as historic information and crop calendar 
models. Agricultural imagery data usually have a field-like structure, and it 
is observed that fields are relatively easy to label when compared to pixels. 
Recently, considerable interest has been shown in developing techniques for 
locating fields in the imagery data (ref. 2-4) and in developing methods for the 
probabilistic labeling (refs. 10, 11) of cluster distributions using information 
from a given set of labeled fields. Once the fields are located by a field- 
finding algorithm, the problem of fitting a mixture of Gaussian density 
functions to the data by taking into account the field structure of the data is 
considered in this appendix. 


It is assumed that there are f-fields in the data. Let the field be denoted 
by Fj; let it contain Nj pixels; and let Xj|^, k = l,2,*«*,Nj, be their spectral 
vectors. Let m be the number of clusters in the data. Let P(w^) and p(X|to^) be 
the a priori probability that a field belongs to cluster and cluster condi- 
tional densities, respectively. Let Xj be the concatenated vector of spectral 
vectors of the pixels in the field. That is, 

'x. 


'Jl 

^2 


(B-1) 


It is assumed that the fields are independent. Then, the joint density of 
f-fields is given by 

1 


- n p(5Tj) 


(B-2) 
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The mixture density p(x ) can be written as 

^ m ~ 

' 5 p ("j I “*) 


(B-3) 


If it is assumed that the spectral vectors of the pixels in each field are 
cluster conditionally independent, then 


P 
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Using equations (B-3) and (B-4), the joint density of equation (B-2) can be 
written as follows. 


P 



m 

£ P(“o) 
£=1 ^ 
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Since the logarithm is a monotonic function of its argument, taking 
both sides of equation (B-5) and denoting it by £ results in 


f 

Ir?i 


) 

£ = ^ log' 
J=1 1 
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* 



the log of 
(B-6) 


From equation (B-6), which is similar to equation (2-7), the maximum likelihood 
equation for the probability that a field belongs to a cluster can easily be 
obtained as the following. 


P(U), 


= T 
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E 

J=1 


P(Wy.)P 
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^j) 
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If 0-j IS a parameter of the density function of the i^*^ cluster, differentiating 
£ of equation (B-6) with respect to 0^ yields the following. 



{log[p(Xj|^|o)^.)] 


(B-8) 
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If the probability density functions of the clusters are Gaussian [i.e., 
p(X|o)^) ~ N(U^,E^)], from equation (B-8), the maximum likelihood equations for 
the mean and covariance matrix of the densities of the clusters can be shown to 
be the following. 


and 


where 


ENjPhiXj) 


f 

rN- -| 


g <*jk - 


E N.p(.,|Xj) 
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(B-10) 


(B-11) 


If the probability density functions of the clusters are approximated with 
first-order feature dependence trees, maximum likelihood equations for the 
parameters of the densities (similar to those developed in section 5) can 
easily be obtained from equations (B-8) and (3-1) by taking into account the 
field structure of the data. 
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DEPENDENT FEATURE TREES WITH THE NODES 
REPRESENTING FEATURE SUBSETS 

Very often it is necessary to have each node of a dependent feature tree 
represent a set of features instead of one feature. For example, in remote 
sensing, the satellite makes multiple passes over a given area and, at each 
acquisition, gathers several channels of data. In some instances, it is 
desirable to have each node of a dependent feature tree represent a set of 
features (e.g., the set of features corresponding to an acquisition). In this 
appendix, expressions are developed for the mutual information between the 
feature subsets and for the covariance between the feature subsets when a path 
connects them in a dependent feature tree. It is assumed that the features are 
Gaussian distributed. 

Let the components of feature vectors and Xj be the sets of features repre- 
sented by nodes i and j, respectively. Let n^ and nj be the dimensionality of 
vectors X^ and Xj, respectively. If the feature vector X^ is normally distrib- 
uted, its probability density p(X^) can be represented as 

p(X.) ~N(U.,E.) (C-1) 

where U^- is the mean vector and is the covariance matrix. 


Let Z = 



Then , 


p(Z) ~N(U^,E^) 


where 


E 


z 




(C-2) 


(C-3) 
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The mutual information between feature vectors X^- and Xj can be written as 


r 

' p(X.,Xj) ' 

j p(X.,Xj)log 

p(XY)p(TjTJ 


dX. dX. 
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log 


~ Teti 
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where |E^| is the determinant of the matrix Let y and v be the zero mean 
normal random vectors. That is, p(y) ~ N{0,Cy) and p(v) ~ N(0,C^). Let 
Z = (yT.v''') and p(Z) ~ N(0,C2). 


and 



(C-6) 


Consider 


p(ylv) = 


p{Z) 
pi vT 


= constant 



(C-7) 


where 



Thus, the density p(ylv) is Gaussian with the mean and the covariance 

^ y'' 

matrix Qy . Following a similar argument, it can easily be shown that, if X^- 
is normally distributed, p(X^. |Xj) is normally distributed with the mean 
ju^. - - Uj)j and the covariance matrix QT^. Now expressions for the 

covariance between the feature subsets, when a path connects their representa- 
tive nodes in a dependent feature tree, can be derived as in section 4.2. For 
example, if X 4 and X 7 are Gaussian random vectors, similar to equation (4-3), 
the following can easily be obtained. 

J (X^ - U^lpiX^lX^ldX^ = -Q 7 ^Q 74 (X 4 - U 4 ) (C-9) 

Thus, expressions similar to equations (4-8) and (4-9) can easily be obtained. 
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